Tagging a corpus of Malay texts, and coping with ‘syntactic drift’
نویسندگان
چکیده
The structure of Malay presents the corpus linguist with an extremely interesting problem. At high syntactic levels, the language is familiar enough, and one can talk of direct objects in transitive constructions, and even of agentless passives. The dominant sentence order is SVO. Parsing at this level is therefore relatively straightforward. The problem is at lower levels, where Malay patterns quite differently from Indo-European languages. If the linguist tries to process Malay using categories and techniques designed for Indo-European, then it comes across as at best confusing and at worst in a state of chaos. Malay is neither confusing nor in chaos; but it does need to be analysed using techniques which are sensitive to its own patterns.
منابع مشابه
Processing Natural Malay Texts: a Data-driven Approach
This research represents the first attempt to produce a working system for the automatic processing of texts of Bahasa Melayu ‘Malay’. At the heart of the system is an integrated relational lexical database called MALEX, which draws on the experience of working on English and other languages, but which is specifically tailored to the conditions of Malay. The development of the database is from ...
متن کاملA Syntactically and Semantically Tagged Corpus of Russian: State of the Art and Prospects
We describe a project aimed at creating a deeply annotated corpus of Russian texts. The annotation consists of comprehensive morphological marking, syntactic tagging in the form of a complete dependency tree, and semantic tagging within a restricted semantic dictionary. Syntactic tagging is using about 80 dependency relations. The syntactically annotated corpus counts more than 28,000 sentences...
متن کاملFeature extraction in opinion mining through Persian reviews
Opinion mining deals with an analysis of user reviews for extracting their opinions, sentiments and demands in a specific area, which can play an important role in making major decisions in such area. In general, opinion mining extracts user reviews at three levels of document, sentence and feature. Opinion mining at the feature level is taken into consideration more than the other two levels d...
متن کاملPart of Speech Tagger for Malay Language Based on Words Morphology
PART OF SPEECH TAGGER FOR MALAY LANGUAGE BASED ON WORDS MORPHOLOGY Mohd Pouzi Hamzah, Syarifah fatem Na’imah Binti Syed Kamaruddin School of Informatics and Applied Mathematics, Universiti Malaysia Terengganu, 21030 Kuala Terengganu, Terengganu, Malaysia Email: [email protected], [email protected] ABSTRACT : Part of Speech (POS) tagging is an essential task in pre-processing for text process...
متن کاملAn Approach to Proper Name Tagging for German
This paper presents an incremental method for the tagging of proper names in German newspaper texts. The tagging is performed by the analysis of the syntactic and textual contexts of proper names together with a morphological analysis. The proper names selected by this process supply new contexts which can be used for finding new proper names, and so on. This procedure was applied to a small Ge...
متن کامل